05. Assessing Data

Assessing Data

The files all_alpha_08.csv and all_alpha_18.csv discussed in the previous pages have been provided in the workspace for you here to access. Use pandas to explore these datasets in the Jupyter Notebook below to answer the quiz questions below the notebook about these characteristics of the data:

  • number of samples in each dataset
  • number of columns in each dataset
  • duplicate rows in each dataset
  • datatypes of columns
  • features with missing values
  • number of non-null unique values for features in each dataset
  • what those unique values are and counts for each

Workspace

This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.

Workspace Information:

  • Default file path:
  • Workspace type: jupyter
  • Opened files (when workspace is loaded): n/a

QUIZ QUESTION: :

Find the correct count for each of the following in the 2008 dataset

ANSWER CHOICES:



Feature

Count

3889

2404

4

199

26

25

1611

18

1

SOLUTION:

Feature

Count

2404

199

25

18

QUIZ QUESTION: :

Find the correct count for each of the following in the 2018 dataset

ANSWER CHOICES:



Feature

Count

15

1611

18

2404

2

32

0

SOLUTION:

Feature

Count

1611

18

2

0

QUIZ QUESTION: :

Match the datatype for each feature (some of these may not be ideal)

ANSWER CHOICES:



Feature

Datatype

bool

float

float

string

int

string

bool

string

int

SOLUTION:

Feature

Datatype

float

float

string

string

string

int

string

string

string

string

string

string

int

QUIZ QUESTION: :

Match the number of non-null unique values for each of the following features

ANSWER CHOICES:



Feature

Unique Values

3

2

5

2

42

1

18

14

3

SOLUTION:

Feature

Unique Values

3

3

2

2

2

2

14

3

3

How do the Cyl columns in the 2008 and 2018 datasets differ?

SOLUTION:
  • Datatypes
  • Format
  • Number of unique values

QUIZ QUESTION: :

Where are each of these fuel types present?

ANSWER CHOICES:



Fuel Type

Dataset

Both

2018

Both

2008

2008

Neither

Both

2018

SOLUTION:

Fuel Type

Dataset

Both

Both

2018

2018

Both

Both

2008

2008

Both

Both

2018

2018